System and Method for Protecting User Privacy Using Social Inference Protection Techniques

ABSTRACT

A system and method for protecting user privacy using social inference protection techniques is provided. The system executes a plurality of software modules which model of background knowledge associated with one or more users of the mobile computing devices; estimate information entropy of a user attribute which could include identity, location, profile information, etc.; utilize the information entropy models to predict the social inference risk; and minimize privacy risks by taking a protective action after detecting a high risk.

RELATED APPLICATIONS

This application is a continuation application that claims priority toU.S. Non-Provisional application Ser. No. 12/507,508 filed Jul. 22,2009, which claims priority to U.S. Provisional Application Ser. No.61/082,551 filed Jul. 22, 2008, the entire disclosures of which areexpressly incorporated by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to the protection of the privacy ofindividuals communicating through computing or telecommunicationdevices/systems, or interacting software applications. Morespecifically, the present invention relates to a system and method forprotecting user privacy using social inference protection techniques.

2. Related Art

Social computing relates to any type of computing application in whichsoftware serves as an intermediary of social relations. Examples ofsocial computing applications include email, instant messaging, socialnetworking web sites, and photo sharing web sites. Mobile socialcomputing relates to social applications that run on mobile devices. Awide variety of mobile social computing applications exist, many ofwhich leverage location and mobility to provide innovative services.Examples of such applications include the Ulocate system, which allowfor user real-time tracking of users and provides a list people within asocial network, as well as their locations; the LoveGety system, whichprovides proximity match alerts when a male LoveGety user and a femaleLoveGety user are within 15 feet of one another; the ActiveCampussystem, which provides maps showing the location of users on campus; theSocial Net system, which provides social match alerts inferred fromcollocation histories; various matching systems which recommend peopleto people based on similar interests, activities, personalities, etc.;and Twitter, which enables microblogging/citizen journalism.

Many social networking sites such as Facebook and Orkut use userprofiles and existing friendships to enable social communications orrecommend possible matches. However, using and sharing geotemporal andpersonal information raises many serious privacy concerns. Examples ofcategories of potential privacy invasions in mobile social computingsystems are: inappropriate use by administrators (for example, a systemadministrator may sell personal data without permission); legalobligations (for example, a system administrator may be forced by anorganization such as the police to reveal personal data); inadequatesecurity; lack of control over direct revelations (for example, a cellphone application that reveals one location to a person's friends, butdoes this without properly informing the person or giving the personcontrol of this feature); instantaneous social inference through lack ofentropy: (for example, when one cell phone shows that Bob is nearby, andonly two people with a similar cell phone are visible—one of them mustbe Bob, thus increasing the chance of identifying him; the example ofthe student and professor mentioned in the introduction also illustratesthis category); historical social inferences through persistent userobservation (for example, two nicknames are repeatedly shown on thefirst floor of the gym where the gym assistant normally sits—one of themmust be the gym assistant); and social leveraging of privileged data(for example, David can't access a location, but Jane can—David asksJane for the location).

The problem of social inferences which include instantaneous socialinferences and historical social inferences is of particular concern insocial computing. Inference is the process of concluding unrevealedinformation as a consequence of being presented with authorizedinformation. A well-known example of the inference problem relates to anorganization's database of employees, where the relation <Name, Salary>is a secret, but user u requests the following two queries: “List theRANK and SALARY of all employees” and “List the NAME and RANK of allemployees.” None of the queries violates the security requirementbecause they do not contain the top-secret <NAME; SALARY> pair. Butclearly, the user can infer the salaries of the employees using theirranks. Although the inference problem as a threat to databaseconfidentiality is discussed in many studies, mobile social computingraises new classes of more complicated inferences, which we call socialinferences. Social inferences are inferences about user information suchas identity, location, activities, social relations, and profileinformation.

The social inference problem can include a wide range of issues.However, any inference that results from using social applications canbe made in one of the following two ways:

-   -   1) the inferrer uses only the current state of the system, which        is based only on the current observation of the system (referred        to as “instantaneous inference”); or    -   2) the inferrer uses the history of her/his observations, or the        history of the answers to previous queries (referred to as        historical inference).

Based on the nature of mobile social applications, social inferences areeither the result of accessing location-based information or the resultof social communications, or both. The first type is referred to hereinas “location-related inferences” and the second type is referred toherein as “inferences in online communications.” The following examplesare illustrative:

-   -   1. Instantaneous social inferences in online communications:        Cathy chooses a nick name for her profile and hides her real        name, but her profile shows that she is a female football        player. Since there are only a few female football players at a        given school, there is a high chance she can be identified.    -   2. Instantaneous location-related social inferences: a cell        phone shows few nicknames in a room, and it is known that the        room is Professor Smith's office. Therefore, Professor Smith is        in his office and one of those few nicknames belongs to him.    -   3. Historical location-related inferences: Superman2 and        Professor Johnson are repeatedly shown in a room, which is known        as Professor Johnson's office. It is also known that David is        his Ph.D. student. Therefore, Superman2 must be David and he is        currently at Professor Johnson's office.

Instantaneous and historical inferences must be predicted differently.However, previous inference prevention methods have not adequatelyaddressed social inferences. This is due, in large part to the factsthat:

1. The sensitivity of user information is dynamic in nature based on thecontext, such as time and location;

2. Information available to users is not limited to answers obtainedfrom their queries, but includes users' background knowledge (theinformation users learn outside the database), which often is a premisein many inferences;

3. Information such as life patterns, physical characteristics, and thequality of social relations that are not kept in the database can beinferred from information available to the user (therefore, inferencesin such systems are not limited to database attribute disclosures); and

4. Most social inferences are partial inferences not absoluteinferences, i.e. they don't logically result from the premises as in thename-rank-salary example, but they can be guessed as a result of lowinformation entropy.

Extensive research and industry efforts have focused on helping computerusers protect their privacy. Researchers have looked at various aspectsof privacy enhancement such as ethics of information management, systemfeatures, access control systems, security and database confidentialityprotection. These efforts can be classified into four sections, asdiscussed below: (1) ethics, principles and rules; (2) direct accesscontrol systems; (3) security protection; and (4) inference controlsolutions.

(1) Ethics, Principles, and Rules

In order to properly respond to concerns of ethics, principles andrules, and to protect the user privacy, researchers have made varioussuggestions. In particular, they have mentioned the following provisionsfor privacy sensitive systems:

-   -   Provide users with simple and appropriate control and feedback        especially on the ways others can interact with them or access        their information;    -   Provide appropriate user confirmation feedback mechanisms;    -   Maintain comfortable personal spaces to protect personal data        from surreptitious capture;    -   Provide a decentralized architecture;    -   Provide the possibility of intentional ambiguity and plausible        deniability;    -   Assure limited retention of data or disclose the data retention        policy;    -   Facilitate the users with enough knowledge of privacy policies;        and    -   Give users access to their own stored information.

However, the foregoing provisions do not ensure that data will not beused in any undesired way, or that unnecessary data will not becollected. Therefore, one effort defines the principles of fairinformation practices as openness and transparency, individualparticipation, collection limitation, data quality, use limitation,reasonable security, accountability, and explicit consent. Then,principles for privacy in mobile computing are set, which consist ofnotice, choice, proximity, anonymity, security, and access. Theaforementioned concerns and suggested requirements all relate to theaforementioned categories of inappropriate use, legal obligations,inadequate security, and poor features.

(2) Direct Access Control Systems

Access control systems provide the user with an interface and directlycontrol people's access to the user or his/her information based onhis/her privacy settings. Access control systems with an interface toprotect user privacy started with internetworking. Later, they wereextended to context-aware and then ubiquitous computing. The earliestwork with in this area is P3P. P3P enables users to regulate theirsettings based on different factors including consequence, data-type,retention, purpose, and recipient. Another access control system,critic-based agents for online interactions, watch the user's actionsand make appropriate privacy suggestions. Access control mechanisms formobile and location-aware computing were introduced later.

In mobile systems, the context is also used as a factor in decisionmaking. Thus, in addition to the factors defined in P3P, such as therecipient, the following aspects of context have been considered:

-   -   Location of the data owner;    -   Location of the data recipient;    -   Observational accuracy of data/granularity;    -   Persistence of data; and    -   Time.

One system, Confab for mobile computing environments, enables users toset what information is accessible by others on their contact list basedon the time of information collection. Similar systems in mobileenvironments adds the time of information collection to the factors ofrecipient and data-type. Also, a privacy awareness system targeted atmobile computing environments has bee implemented, and is designed tocreate a sense of accountability for users. It allows data collectors toannounce and implement data usage policies, and provides data subjectswith technical means to keep track of their personal information.Another approach involves a peer-to-peer protocol for collaborativefiltering in recommendation systems, which protects the privacy ofindividual data.

More recently, the use of location data has raised important privacyconcerns. In context-aware computing, the Place Lab system has beenproposed for a location-enhanced World Wide Web. It assumes a locationinfrastructure that gives users control over the degree of personalinformation they release. Another approach relates to the idea ofhitchhiking for location-based applications that use location datacollected from multiple people to infer such information as whetherthere is a traffic jam on a bridge. It treats the location as theprimary entity of interest. Yet another solution extended the P3P tohandle context-aware applications and defined a specification forrepresenting user privacy preferences for location and time. In aconceptually similar work, another approach examined a simpleclassification and clearance scheme for privacy protection. Each contextelement of any user is assigned a classification level indicating itssensitivity and accessing users are each assigned clearance valuesrepresenting levels of trust for the various elements that can beaccessed. For better robustness, this approach made a list of accesscontrol schemes for specific elements, thus allowing a combination ofpermissions for read, write and history accesses.

An identity management system for online interactions in a pervasiveenvironment encompassing PDAs has also been proposed. It enables theusers to control what pieces of their personal information to reveal invarious pre-defined situations such as interacting with a vendingmachine, doing bank activities, or getting a bus time table. Not onlydoes the sensitivity of information depend on the context in a mobilesystem, but the context itself can also be part of the information thatrequires protection. There have been few attempts to implement systemsthat do both. One solution suggested a system in which users can definedifferent situations and different faces for themselves, and they candecide who sees what face in which situation. Another solution involvesa simulation tool to evaluate architectures parameterized by users'privacy preferences for context aware systems. Users can set theirpreferences to protect various types of personal information in varioussituations. Still other approaches focused on location privacy inpervasive environments, wherein the privacy-protecting framework isbased on frequently changing pseudonyms to prevent user identification.Finally, one solution suggested the idea of Virtual Walls which allowusers to control the privacy of their digital information.

Access control systems mostly deal with a lack of control over directrevelations. Since they only help users control direct access to theirinformation and don't prevent inferences, they don't fully protect userprivacy.

(3) Security Protection

Security protection handles the following aspects:

-   -   Availability (services are available to authorized users);    -   Integrity (free from unauthorized manipulation);    -   Confidentiality (only the intended user receives the        information);    -   Accountability (actions of an entity must be traced uniquely);        and    -   Assurance (assure that the security measures have been properly        implemented).

Therefore, security research has explored detection and prevention ofmany attacks including Reconnaissance, Denial-of-Service, PrivilegeEscalation, Data Intercept/Alternation, System Use Attacks, andHijacking. Confidentiality protection is the area that contains most ofthe previous research on the inference problem. The inference problem ismostly known as a security problem that targets system-basedconfidentiality. Therefore, suggested solutions often deal with securedatabase design. There are also methods that evaluate the queries topredict any inference risks.

(4) Inference Control Solutions

Inference is commonly known as a threat to database confidentiality. Twokinds of techniques have been proposed to identify and remove inferencechannels. One technique is to use semantic data modeling methods tolocate inference channels in the database design, and then to redesignthe database in order to remove these channels. Another technique is toevaluate database queries to understand whether they lead to illegalinferences. Each technique has its own drawbacks. The former has theproblem of false positives and negatives, and vulnerability to denial ofservice attacks. The latter can cause too much computational overhead.Besides, in a mobile social computing application they both can limitthe usability of the system, because they can restrictively limit useraccess to information. Both techniques have been studied for statisticaldatabases, multilevel secure databases, and general purpose databases. Afew researchers have addressed this problem via data mining. Since inmobile social computing, user information and preferences are dynamic,queries need to be evaluated dynamically and the first method cannot beused in such systems.

With the development of the World Wide Web, new privacy concerns havesurfaced. Most of the current work in access control for web documentsrelates to developing languages and techniques for XML documents. Whilethese works are useful, additional considerations addressing the problemof indirect accesses via inference channels are required.

Classical information theory has been employed to measure the inferencechance. Given two data items x and y, let H(y) denote the entropy of yand H_(x)(y) denote the entropy of y given x, where entropy is asdefined in information theory. Then, the reduction in uncertainty of ygiven x is defined as follows:

${{Infer}\left( x\rightarrow y \right)} = \frac{{H(y)} - {H_{x}(y)}}{H(y)}$

The value of Infer (x→y) is between 0 and 1, representing how likely itis to derive y given x. If the value is 1, then y can be definitelyinferred given x. However, there are serious drawbacks to using thistechnique:

-   -   1. It is difficult, if not impossible, to determine the value of        H_(x)(y); and    -   2. The computational complexity that is required to draw the        inference is ignored—nevertheless, this formulation has the        advantage of presenting the probabilistic nature of inference        (i.e. inference is a relative not an absolute concept).

Additional research has focused on techniques for anonymization.Anonymity is defined as not having identifying characteristics such as aname or description of physical appearance disclosed so that theparticipants remain unidentifiable to anyone outside the permittedpeople promised at the time of informed consent. Recently, new measuresof privacy called k-anonymity and L-diversity have gained popularity.K-anonymity is suggested to manage identity inference, while L-diversityis suggested to protect both identity inference and attribute inferencein databases. In a k-anonymized dataset, each record isindistinguishable from at least k-1 other records with respect tocertain “identifying” attributes. These techniques can be broadlyclassified into generalization techniques, generalization with tuplesuppression techniques, and data swapping and randomization techniques.Nevertheless, k-anonymized datasets are vulnerable to many inferenceattacks and collection of knowledge outside of the database andL-diversity is very limited in its assumptions about backgroundknowledge.

Identity inferences in mobile social computing cannot be addressed bythe above techniques because:

-   -   The sensitivity of user information is dynamic in nature based        on the context, such as time and location;    -   Information such as life patterns, physical characteristics, and        the quality of social relations that are not kept in the        database can be inferred from information available to the        user—therefore, inferences in such systems are not limited to        attribute disclosures; and    -   Users' background knowledge (the information users learn outside        the database) is a premise in many inferences.

The present invention addresses the foregoing shortcomings by providinga system and method for protecting user privacy using social inferenceprotection techniques.

SUMMARY OF THE INVENTION

The present invention relates to a system and method for protecting userprivacy using social inference protection techniques. The presentinvention could be implemented as a software application executing on aserver or a networked computing device (e.g., a mobile computing deviceforming part of one or more computer networks). The software applicationincludes a plurality of modules which execute the following steps: (1)modeling of the inferrer's background knowledge (which typicallyincludes the social context); (2) keeping a record of informationrevealed in the past such as the answer to previous queries; (3)estimating information entropy of a user attribute which could includeidentity, location, profile information, etc; (4) utilizing theinformation entropy models to predict the social inference risk; and (5)minimizing privacy risks by taking a proper action after detecting ahigh risk. Actions taken by the system to minimize privacy risksinclude, but are not limited to, informing users about current privacyrisks through visualizations, providing users with a history of whatthey have revealed, reminding users of current privacy policies,enacting privacy policies that would prevent the unwanted exchange ofinformation, etc.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing features of the invention will be apparent from thefollowing Detailed Description of the Invention, taken in connectionwith the accompanying drawings, in which:

FIG. 1A is a diagram showing hardware and software components of thepresent invention, implemented in a client/server environment; FIG. 1Bis a diagram showing the present invention implemented as a softwareapplication executing on a computing device;

FIG. 2 is a diagram showing a time domain discretization technique forprocessing user queries;

FIG. 3 is flowchart showing processing steps implemented by the presentinvention for protecting user privacy by mitigating social inferences;

FIG. 4 is a flowchart showing processing steps according to the presentinvention for protecting user privacy in a computer-mediatedenvironment;

FIG. 5 is a flowchart showing processing steps according to the presentinvention for protecting user privacy in co-presence and proximity-basedapplications;

FIG. 6 is a flowchart showing processing steps according to the presentinvention for protecting user privacy during handing of queries forinformation issued by users;

FIGS. 7A-7D are simulation results showing various privacy risksassociated with computer-mediated communications systems;

FIG. 8 is a diagram showing a classification tree system according tothe present invention for modeling user context information; and

FIGS. 9-11 are diagrams showing classification results modeled by thesystem of FIG. 8.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to social inference protection systems andmethods, as discussed in detail below with reference to FIGS. 1-11.

FIG. 1A is a diagram showing hardware and software components of thesocial inference protection system of the present invention, indicatedgenerally at 10, implemented in a client/server environment. As will bediscussed in greater detail below, the present invention could beimplemented on a server forming part of a client/server environment, oras a standalone software application executing on a computing deviceforming part of a network.

As shown, the system 10 could be implemented on a computer system(server) in communication with a network 20 (which could include theInternet, a local area network (LAN), wide area network (WAN),metropolitan area network (MAN), etc.), and could include a plurality ofsoftware modules 12 a-12 d executed by the system 10, a central datastore 14 accessible by the software modules 12 a-12 d, and a networkinterface 16 for allowing the software modules to communicate with aplurality of computing devices 24 a-24 d via the network 20, whichcomputing devices could be mobile (such as cellular telephones,portable/laptop computers, PDAs, etc.) or fixed. The computing devices24 a-24 d could be part of networks 22 a-22 b, as shown in FIG. 1. Thenetworks 22 a-22 b could support one or more social networks and/orsocial networking websites. It is noted that the computing devices 24a-24 d need not be mobile, and indeed, could be any computer systemsforming part of the networks 22 a-22 b. The software modules 12 a-12 dinclude a context modeling engine 12 a, an information entropy modelingengine 12 b, a privacy threshold calculation engine 12 c, and a riskminimization engine 12 d.

As will be discussed in greater detail below, the modules 12 a-12 dprotect against inferences being made with respect to users of thecomputing devices 24 a-24 d, so as to protect the privacy of such users.The system 10 could be any suitable computer hardware platform (e.g., asingle or multi-processor server having INTEL or other suitablemicroprocessors), running any suitable operating system (e.g., UNIX,LINUX, SOLARIS, WINDOWS SERVER, MACOS, etc.), without departing from thespirit or scope of the present invention. The network interface 16 couldinclude a firewall for restricting access to the system 10, as well asassociated data communications hardware. The data store 14 could includeany suitable database management system, such as MYSQL, etc. Further,the software modules 12 a-12 d could be programmed using any suitable,high-level computing language, such as C, C++, C#, Java, etc.

As shown in FIG. 1B, the system 10 could also be implemented as astandalone software application which executes on a computing device 28,which could be mobile or fixed and which is connected to a network(e.g., a LAN, MAN, WAN, the Internet, etc.) via a network interface 29(which could be wired or wireless). In such circumstances, the system 10includes the software modules 12 a-12 d, and the data store 14, whichperform the functions discussed herein. The system 10 could beprogrammed using any suitable, high-level computing language.

The system 10 gathers and process information about users of thecomputing devices 24 a-24 d, and prevents instantaneous socialinferences and historical social inferences. Based on informationtheory, as more information is collected about a user, such as his/hercontextual situation, uncertainty about other aspects, such as his/heridentity, may be reduced, thereby increasing the probability ofcorrectly guessing these aspects. This probability also depends on thenumber of entities (e.g., users) that match the collected information.Collected information is not just the information that the presentinvention provides to users, but also includes the information collectedoutside of the data store 14 or background knowledge. Furthermore,inferred information may be external (outside of the data store 14).Examples include partial identity (e.g., identity at the physicalappearance granularity), external profile information, and externalsocial relations. Therefore, social inferences happen when informationprovided by the system 10 combined by the inferrer's backgroundknowledge reduces the inferrer's uncertainty about a database attributeor an external attribute to a level that he/she could guess thatattribute.

Inference control can be defined by

(PK_(A)(Q)

FK_(A)(Q)), where _(PK) _(A) _((Q)) means A is permitted to know Q and_(FK) _(A) _((Q)) means A is forbidden to know Q. For a predefinedaccess control table, the following definition is set forth:

R(A)={s∈Λ|l(s)≦L(A)}

F(A)={s∈Λ|

(l(s)≦L(A))}

where A is the set of sentences, R(A) is the set of sentences for whichA is explicitly permitted to have access, F(A) is the set of sentencesfor which A is explicitly forbidden to have an access, L(A) is theaccess level of user A, and l(s) is the classification level of sentences. This definition means that forbidden data are not the data that arespecifically forbidden, but the data that are not specificallypermitted. R(A) and F(A) cannot be defined for a mobile social computingenvironment where privacy settings may be highly dynamic. However, thefollowing indication can be used:

[PK _(A)(Q)

PK _(A)(Q=>Φ)]=>PK _(A)(Φ)

Consequently, if we want Φ to be forbidden for A and Φ can be inferredfrom Q, Q should be forbidden for A as well. To understand whatdetermines _(Q=>Φ), we remind that considering partial inferences in amobile social computing system, Φ may not be logically deduced from Q asindicated by _(Q=>Φ), but, Φ may belong to the Sphere Of Influence of Q(_(SOI(Q))). Accordingly, we modify Cuppons' formulation as follows: Wedefine Q to be the information included in the query, its answer, andbackground knowledge that is modeled as described below. Q is safe to becompletely known by A if ∀Φ, [(Φ∈SOI(Q)

PK_(A)(Q))

PK_(A)(Φ)]. In a social computing system, _(Φ∈SOI(Q)) if knowing Qreduces the uncertainty about Φ and results in lack of informationtheory.

Based upon the foregoing, the modules 12 a-12 d of FIGS. 1A-1B performthe following steps:

-   -   I. Model background knowledge for related user social context        (e.g., for introduction between strangers) which could be        deterministic or probabilistic, using context modeling engine        12 a. The background information could also be monitored by a        system operator other individual, and the monitored information        could be stored in a database, i.e., the context modeling engine        12 a is optional.    -   II. Calculate the information entropy of user context (and        related inference function) for instantaneous inferences using        information entropy modeling engine 12 b, taking background        knowledge into account.    -   III. Calculate the information entropy associated with user        context for historical inferences using information entropy        modeling engine 12 b, taking the answer to the past queries and        background knowledge into account.    -   IV. Calculate the privacy thresholds using the privacy threshold        calculation engine 12 c, based on the user settings,        administration privacy policies, community set privacy policies,        or social and legal norms.    -   V. Find out if current or likely future entropy level passes the        threshold or violates any of the above policies using the        privacy threshold calculation engine 12 c.    -   VI. Protect users' privacy by taking suitable risk-minimizing        actions using the risk minimization engine 12 d.

The functions performed by each of the software modules 12 a-12 d arediscussed below in greater detail.

Context Modeling Engine 12 a

Most previous inference control frameworks are vulnerable to attacksbased on background knowledge. Background knowledge is the informationavailable to users outside the database. This information should beassumed to be known by all the users just like answers to their queriesare assumed to be known by them. In order to preserve databaseintegrity, it is necessary to model user knowledge in the outside world.

Background knowledge can be deterministic or probabilistic.Deterministic knowledge is the information that user has gain or caneasily access by accessing available sources while probabilisticknowledge is just a guess. For example, the fact that a specific room oncampus is Professor Smith's room is a deterministic knowledge because itis mentioned on the school's website while guessing someone's genderonly based on their chat style is a probabilistic knowledge.

To satisfy the inference control conditions in this domain, the contextmodeling engine 12 a models easy-to-learn information about location aspart of A's background knowledge in addition to the information that thesystem gives out to him.

In proximity-based applications, background knowledge includes visualinformation about nearby people and knowledge of their names. Examplesof such background knowledge are as follows:

Users' background knowledge about their vicinity:

-   -   Physical appearance of nearby people.    -   Names and profile information of nearby people who the inferrer        knows.    -   Nearby people who are users of the social application (carry the        needed device)

Users' background knowledge about rooms and places:

-   -   Purpose and schedule of the place    -   People related to the schedule    -   Owner or manager of the place    -   People related to the manager

To learn about background knowledge in computer-mediated communications,studies can be run between people communicating on-line. Modeledinformation can include profiles, campus directory (in the case of acampus environment), and guesses based on gender and chat style asbackground knowledge. Background knowledge in this context can becategorized as follows:

-   -   General demographics and personal information such as personal        profiles    -   Related organizations' public information about people (such as        the school's directory, school's website, phone directories, and        yellow pages)    -   Guess on gender    -   Guess on ethnicity

We assume background knowledge is included in the model Q. Hence, Qincludes the information in the query, its answer as well as the modeledinformation. Now, we can estimate SOI(Q) having all the informationmodeled in Q. All the information included in Q and its highergranularities need to be checked. This modeled information is thenstored by the context modeling engine 12 a in the data store 14. It isnoted that the engine 12 a is optional, and that background informationcould be acquired through monitoring by a system operator or otherindividual, and then entered into the data store 14 by such individual.

Information Entropy Modeling Engine 12 b

Engine 12 b calculates the risk that an attribute i is inferred fromrevealed information, Q. We define the inference function as follows:

$\begin{matrix}{{{{INF}\; 1\left( Q\rightarrow\Phi \right)} = \frac{H_{\max} - H_{c}}{H_{\max}}},} & (1)\end{matrix}$

where H_(max) represents the maximum entropy for the environment and isfixed for any given application; H_(c) is the entropy under the currentconditions and is dynamic based on the situation. H_(max) is calculatedas follows:

$\begin{matrix}{H_{\max} = {- {\sum\limits_{1}^{X}\; {{P \cdot \log_{2}}P}}}} & (2)\end{matrix}$

where P=1/X and X is the maximum number of entities (users) related tothe application. H_(c) is calculated as follows:

$\begin{matrix}{H_{c} = {- {\sum\limits_{1}^{V}\; {P\; {1 \cdot \log_{2}}P\; 1}}}} & (3)\end{matrix}$

where V is the number of entities whose attribute Φ falls in SOI(Q). P1is the probability that each of them is thought to be the correctattribute by the inferrer. For example, if the inferrer can seesomeone's available profile information, then:

-   -   Q: (profile information=given profile information & profile        identity at nickname granularity or anonymous).    -   Φ: identity at real-name granularity.    -   X: the total number of potential profile users.    -   V: number of users, who have the same information in their        profile.

One of the advantages of this formulation is that we can set the valueof H_(max) in each application in such a way that any given value ofINF1 means the same inference chance independent of the application.When INF1 is too high, say larger than C, an appropriate action is takenby the risk minimization engine 12 d. The appropriate action can berejecting the query, blurring the answer, or sending a warning to theowner of the information.

When it comes to anonymity protection, the entropy control model alsosatisfies k-anonymity under any given condition for any given inferrer.In a k-anonymized dataset, each record is indistinguishable from atleast k-1 other records with respect to certain “identifying”attributes. In our model, the above entities and attributes will beusers and identities respectively. If we assume that all the informationavailable to the users is deterministic (which means if they are able toaccess the information source, they are either able to know the exactanswer or not) and assume that all information available outside thedatabase is included in Q, then P1 in equation (3) equals 1/V and Hcequals Σ(1/V)·log(1/V). Therefore, for any given application and knowncondition, INF1 is only determined by the number of users satisfyingthat condition; i.e., to have an INF1 value smaller than the associatedthreshold, at least U indistinguishable users are needed in thesituation. This satisfies k-anonymity with k=U. However, the presententropy control method is more general than the k-anonymity solution. Inparticular, it can model a probabilistic model of background knowledge,such as guesses on gender and home country. It can also be used tocalculate the risk of inferring other attributes such as location. Thefollowing examples illustrate how it is applied to identity protection.

A. Location-Related Identity Inference.

Background knowledge includes visual information about the inferrer'svicinity. For example, if the inferrer knows that he or she has got anearby match in a matching system, then:

-   -   Q: (time=current time & location=vicinity of the inferrer &        physical characteristics match possible physical characteristics        of the inferrer's matching interests & nearby people and        events).    -   Φ₁: nearby people's identity at physical appearance granularity.    -   X: total number of potential users of the application.    -   V: number of users that satisfy Q, i.e. they are currently in        the inferrer's vicinity and match the possible physical        characteristics.

If the resulting INF1 value is close to 1, there is a high inferencerisk and we need to take an action to prevent it. For example, we canuse a blurring method and instead of showing the location at roomprecision, we could just show it at floor or building precision, or wecan send a warning to the information owner about the high inferencerisk.

B. Identity Inference in Computer-Mediated Communication

We also explained background knowledge associated with on-linecommunications. As an example of a possible inference, Catherine ismatched with an anonymous person. During the introduction course ofmatching, her match selects two profile items to be revealed: (1) he isHispanic; and (2) he is a member of the basketball team. Catherine doesa search on the profiles and she finds only one Hispanic basketballplayer. Therefore, she can find his name and all his public profileinformation.

In an anonymous communication, assuming that communication partners Aand B are not nearby, the probability that B infers A's identity basedon equation (1) is modeled as follows:

-   -   Q: (profile items matching A's profile items that are already        revealed).    -   Φ: Partner's identity.    -   X: total number of potential users of the application.    -   V: number of users that satisfy Q.

We define:

-   -   Group F: users that are the same sex as A and satisfy Q.    -   Group G: users in the inferrer's vicinity that come from the        same country/region as A and satisfy Q.    -   X1: number of users in group F.    -   X2: number of users in group G.    -   X3: number of users in the intersection of F and G (F∩G).    -   ζ: probability of guessing the right gender from the partner's        chat style (which was shown to be 10.8% in our user study).    -   σ: probability of guessing the right home country from the        partner's chat style (which was shown to be 5.4% in our user        study).        -   If online typing is enabled:

$\begin{matrix}\; & (5) \\{{P\; 1} = \left\{ \begin{matrix}{{{ϛ \cdot {\sigma/X}}\; 3} + {{ϛ \cdot \left( {1 - \sigma} \right)}\text{/}\left( {X\; 1} \right)} + {{\left( {1 - ϛ} \right) \cdot \sigma}\text{/}\left( {X\; 2} \right)}} & {{{for}\mspace{14mu} F}\bigcap G} \\{{{ϛ \cdot \left( {1 - \sigma} \right)}\text{/}\left( {X\; 1} \right)} + {{\left( {1 - ϛ} \right) \cdot \left( {1 - \sigma} \right)}\text{/}V}} & {{{for}\mspace{14mu} F\text{-}F}\bigcap G} \\{{{\left( {1 - ϛ} \right) \cdot \sigma}\text{/}\left( {X\; 2} \right)} + {{\left( {1 - ϛ} \right) \cdot \left( {1 - \sigma} \right)}\text{/}V}} & {{{for}\mspace{14mu} G\text{-}F}\bigcap G} \\{{\left( {1 - ϛ} \right) \cdot \left( {1 - \sigma} \right)}\text{/}V} & {\begin{matrix}{{for}\mspace{14mu} {the}\mspace{14mu} {rest}\mspace{14mu} {of}} \\{{nearby}\mspace{14mu} {users}}\end{matrix}\mspace{11mu}}\end{matrix} \right.} & \;\end{matrix}$

-   -   -   If their communication is carried out only through revealing            profile items and no typing (and therefore no guesses based            on the chat style) is involved:

P1=1/V,   (6)

which simplifies into a k-anonymity solution and if the number of usersin a specific situation is less than a specific number, say U, there isa high chance of inference.

The information entropy modeling engine 12 b also calculates informationentropy for historical inferences, as discussed below. Both the modeledcurrent and historical information entropies are stored in the datastore 14.

Our model in the previous category applies here as well, but in thiscategory Φ∈SOI(Q) if Q along with answers to past queries led to a lackof uncertainty that reveals Φ. The inference chance will be calculatedfrom an inference formula similar to (1), as described below.

$\begin{matrix}{{{INF}_{i}2\left( Q\rightarrow\Phi \right)} = \begin{Bmatrix}{\frac{H_{\max} - H_{i}}{H_{\max}},} & {{{if}\mspace{14mu} \frac{H_{\max} - H_{i}}{H_{\max}}} < C} \\{{\lambda,}\mspace{14mu}} & {{{if}\mspace{14mu} \frac{H_{\max} - H_{i}}{H_{\max}}} \geq C}\end{Bmatrix}} & (12)\end{matrix}$

where

$H_{\max} = {- {\sum\limits_{1}^{Y}\; {{P \cdot \log_{2}}{P.}}}}$

H_(max) is again fixed for each application.)

-   λ=number of queries that involve the attribute and were sent after    (H_(max)−H_(i))/H_(max) reached the threshold value C.

$\begin{matrix}{H_{i} = {- {\sum\limits_{1}^{V}\; {P\; {1 \cdot \log}\; P\; 1.}}}} & (13)\end{matrix}$

where P1 is the probability that each of them is thought to be thecorrect attribute by the inferrer.

V is the number of users that belong to SOI(Q) where Q also includesprevious queries starting at the current time and going back an amountof time equal to T(given).

Unlike INF1, INF_(i)2 is calculated for any time slot i, as illustratedby the time domain discretization graph shown in FIG. 2. INF_(i)2 isbetween 0 and 1 until (H_(max)−H_(i))/H_(max) reaches the thresholdvalue C. When INF_(i)2 equals C, there is a lack of entropy around Φ andsending multiple queries involving Φ can lead to an inference.Therefore, at this time INF_(i)2 starts counting the new queries. AfterINF_(i)2 passes a number of queries, say K, the system takes anappropriate action such as dynamic blurring. We consider discrete finiteduration T for past queries since humans don't have a perfect memory.Thus, we assume they forget the answers to queries sent more than T timeunits ago. However, to protect the system against inference attacks, thecalculation and results can be extended for T→∞. Obviously, if timeslots i and j overlap, rejecting a query based on INF_(i)2 affects thevalue of INF_(i)2.

Privacy Threshold Calculation Engine 12 c

Since H_(max) is fixed for each application, the value of INF1 in aspecific application depends only on H_(c). We can set the value ofH_(max) in each application in such a way that any given value of INF1means the same inference chance independent of the application. Byselecting the appropriate H_(max) for any given application, based onequation (1), when INF1 is too high, say larger than C, is lower than anassociated threshold in the related application:

INF1>C

H _(c) <H _(max)(1−C)

threshold=H _(max)(1−C)   (4)

There are different ways to use estimated thresholds of informationentropy to protect users' privacy. One way is to use privacy preferencesset by a user or a group of users to calculate the threshold as statedin equation (4). Such a calculation is performed by the privacythreshold calculation engine 12 c, and stored in the data store 14.

It is also possible to provide individuals or groups with ways ofsetting new privacy preferences that directly relate to inferencecontrol. One example of inference-related privacy settings is settingdesired degree of anonymity in anonymous on-line communications and CMC.There are more ways to evaluate the information entropy threshold basedon privacy policies: conventional privacy preferences set by a user or agroup, system administrators' policies, or legal and social norms mayinfluence. If revealing a piece of information makes the inferencefunction higher than the threshold, revealing the information canviolate the user's privacy settings.

Risk Minimization Engine 12 d

The risk minimization engine 12 d executes any of the followingprocedures when the information entropy modeled by the engine 12 bexceeds the privacy threshold calculated by the engine 12 c:

-   -   a. Instantiate and deploy software policies for users, groups        and/or administrators that disallow the exchange of information        beyond threshold levels. Such automatic protection could be        achieved by:        -   Automatic reduction of the precision/granularity of the            information to be revealed        -   Automatic blocking of the information to be revealed such as            rejecting the third party's queries    -   b. Alert users to the privacy risks associated with various user        actions based on a calculation of information entropy:        -   Visual alerts on the phone or the desktop: for example a            pop-up, a small window, flashes, or change in the color or            brightness of the screen which warns a user or a group of            users that they are about to pass their threshold.        -   Auditory alerts on the phone or the desktop such as buzzes            or any sound that informs an individual or a group of users            of the risks they are taking.        -   Vibration of the phone or the device in risky situations.    -   c. Providing users with information as to their current        information entropy. This could be in the form of a continuous        visualization of their entropy level (such as their degree of        anonymity) on the phone or on the desktop as a tab, menu, or a        log box.    -   d. Providing users with a log (history) of the information they        have revealed and how it has affected their entropy.    -   e. Reminding users about their current privacy preferences when        they are about to communicate or exchange information (visual or        auditory reminders).    -   f. User policy adjustments and enactment: let the individual or        a group of users change their privacy settings if they get a        warning or observe a risk and act based on the new privacy        settings.    -   g. Adjusting system administrative policies if it can reduce the        inference chance and acting based on the new policies.

FIG. 3 is a flowchart showing overall processing steps, indicatedgenerally at 30, implemented by the present invention to protect userprivacy by mitigating social inferences. The steps shown in FIG. 3 areexecuted by the modules 12 a-12 d discussed above in connection withFIG. 1. Beginning in step 32, the context modeling engine 12 adetermines and stores privacy preferences and user policies associatedwith one or more of the computing devices 24 a-24 d. Also, the privacypreferences and user policies could be determined by a system operatoror other individual. This information is then stored in a privacy policydatabase table 34. In step 36, entropy thresholds which are calculatedby 12 c are stored in database 38.

In step 40, the system 10 receives queries for information from thecomputing devices 24 a-24 d, and stores the queries in the data store14. In step 42, the queries are evaluated relative to the privacypolicies stored in the table 34. In step 44, the instantaneousinformation entropy is then calculated using the information entropymodeling engine 12 b. Also, in step 44, background knowledge stored intable 56 and determined in step 54 is processed. In step 46, the entropythresholds in the table 38 are compared with the entropy calculated instep 44. In step 48, the module 12 b calculates the historicalinformation entropy of one or more user attributes corresponding to oneor more users of the networks 22 a-22 b, and the inference function.Also, in step 48, background knowledge stored in table 56 and determinedin step 54 by the context modeling engine 12 a is processed, as well asa table 58 containing a history of revealed information corresponding tothe user. In step 50, the calculated historical entropy thresholds andhistorical inference functions are compared with their thresholds storedin the table 38. Finally, in step 52, a decision is made based upon thecomparison as to which action to take to protect the user's privacy(e.g., one or more of the actions discussed above in connection with therisk minimization engine 12 d). This information is then stored in table58 containing a history of revealed information, which can also beprocessed in step 48 for future calculations. It is noted that thetables 34, 38, 56, and 58 could form part of the data store 14.

FIG. 4 is a flowchart showing processing steps according to the presentinvention, indicated generally at 60, for protecting user privacy in acomputer-mediated communication environment. These steps are executed bythe software modules 12 a-12 d of FIG. 1. As an example of privacyprotection in computer-mediated communication, assume that users A and Bare anonymously communicating to get to know each other (for examplebecause they are matched). They start by revealing their personal andprofile information and continue until they decide to completelyintroduce or leave the chat. If any of them reveals information thatreduces any piece of their information entropy, such as their identity,it can be inferred by the partner. For example, if A reveals that he isa Hispanic female soccer player and there is only one Hispanic femalesoccer player in their community, B can infer A's identity.

In step 62, privacy policies associated with users of one or more of themobile computing devices 24 a-24 d are determined and stored in aprivacy policy table 64, as well as preferences that the users have foreach piece of personal information, and a desired degree of anonymityspecified by the users. In step 66, information entropy thresholds areestimated, using the engine 12 c. The estimated entropy levels are thenstored in table 68. In step 70, the system 10 determines an item ofpersonal information associated with one or more of the users, which theuser is about to reveal. In step 72, the privacy policies in the table64 are evaluated relative to the personal information about to berevealed. In step 74, the identity entropy of the information iscalculated using the engine 12 b. In this step, additional informationabout the user and his/her environment stored in stables 80-86, such aspersonal profiles, on-line directory information, stored directoryinformation, and revealed personal information, is also processed duringthe calculation. In step 76, the results of the calculation are comparedwith the stored entropy thresholds in the table 68. Then, in step 78,the risk minimization engine 12 d makes a determination as to whichaction to take (e.g., one or more of the privacy actions discussedabove) to protect the user's privacy. This determination is then storedin the table 86. It is noted that the tables 64, 68, 80, 82, 84, and 86could be part of the data store 14.

FIG. 5 is a flowchart showing processing steps according to the presentinvention, indicated generally at 90, for protecting user privacy inco-presence and proximity-based applications. These steps are executedby the modules 12 a-12 d of FIG. 1. As mentioned earlier, in co-locationand proximity-based applications, users have visual knowledge abouttheir vicinity, memory of the visual knowledge of their vicinity, andsome knowledge of their acquaintances' personal profiles. For example,assume that Alice is in Bob's vicinity and they don't know each other.If there are no other females near Bob and he is told that “Alice” or “afemale looking for romance” is there, he will infer that she is the onlygirl he sees in the room. Even if Alice has the option to set herprivacy settings, she may not be able to predict all these risks.Therefore, after her privacy preferences are set and stored in adatabase as well as other privacy policies, inference thresholds arecalculates based on these privacy policies, (e.g., Alice's identitythreshold can be calculated based on her desired degree of anonymity).These thresholds can be stored in an inference management database.Bob's background knowledge related to his vicinity is then estimated andAlice's information entropy is calculated based on that. The next stepis to take possible risk minimization actions. For example, if Alice'sinstantaneous or historical identity entropy is less than her identityentropy threshold, the system can blur the information shown to Bob sothat Bob is presented with an “anonymous user” instead of “Alice” or a“female looking for romance”.

In step 92, entropy thresholds are estimated by the engine 12 c. Thisinformation is also used in step 92 to estimate entropy thresholds, andthe estimations are stored in table 96. Also, in step 94, adetermination is made as to how one person (e.g., Alice, in the exampleabove) is to be presented to another person (e.g., Bob). Thisdetermination is made by referring to the policy and anonymityinformation stored in tables 108 and 110 (which, optionally, could bedetermined and stored in the tables 108 and 110 using a dedicatedprivacy policy capture engine forming part of the system 10 of thepresent invention). In step 98, estimates of the person's (e.g.,Alice's) instantaneous identity entropy are made by the engine 12 b,with references to nearby people 112 and the other person's (e.g.,Bob's) nearby friends and associates 114 (historical information aboutwhich is also stored in tables 116 and 118, respectively).

In step 100, the estimated instantaneous identity entropy is comparedwith the entropy thresholds stored in the table 96. In step 102, thehistorical identity entropy of the person (e.g., Alice), as well as theinference function, is calculated based upon the historical informationstored in the tables 116 and 118 and a history of information revealedabout the person and stored in table 120. In step 104, the estimatedhistorical identity entropy and the historical inference function arethen compared with their thresholds stored in the table 96. Finally, instep 106, one or more of the actions discussed above is taken by therisk minimization engine 12 d to protect the user's privacy. Thisinformation is also stored in the table 120. It is noted that the tables96, 108, 110, 116, 118, and 120 could be part of the data store 14.

FIG. 6 is a flowchart showing processing steps according to the presentinvention, indicated generally at 130, for protecting user privacyduring handing of queries for information issued by users. In step 132,a query for information is received by the system 10 by one or moreusers of the mobile computing devices 24 a-24 d. Then, in step 134, adetermination is made as to whether the answer to the query violatesuser privacy policies. If so, the answer is rejected or blurred (i.e.,part of the answer is hidden so as to preserve user privacy). Otherwise,in step 138, a determination is made as to whether there is a highchance of an instantaneous inference being made based upon the answer.If a positive determination is made, step 140 occurs; otherwise, step144 occurs.

In step 140, a determination is made as to whether the instantaneousinference violates privacy preferences of the user. If a positivedetermination is made, step 142 occurs, wherein the answer is blurred.Otherwise, step 144 occurs. In step 144, a determination is made as towhether there is a high chance of an inference being made based uponhistorical information. If a positive determination is made, step 146occurs; otherwise, step 150 occurs, wherein the query is answeredcompletely. In step 146, a determination is made as to whether theinference violates the user's privacy preferences. If a positivedetermination is made, step 148 occurs, wherein the answer is blurred.Otherwise, step 150 occurs, wherein the query is answered completely.

To evaluate the impact of query rejection on system usability,simulations of scenarios and sequences of events for location-awareapplications and computer-mediated communication were performed. Inlocation-aware applications, less than ten percent of queries would notbe accepted if the user didn't want to be exactly identified.Furthermore, instead of rejecting a query, the precision of disclosedinformation about can be reduced (for example, location can be shown atbuilding level instead of room level). Therefore, location-relatedinferences can be automatically managed by blurring or rejecting thequeries without greatly degrading system usability. However, inferencesin computer-mediated communications can happen more frequently and userswould like to be able to reveal their information if they are willing todo so. In such applications this concern can be addressed by providingusers with visualizations of the risk so that they can make informedinformation exchange decisions.

FIGS. 7A-7D are simulation results showing various privacy risksassociated with computer-mediated communication (CMC) systems. FIG. 7Ashows the probability that a user's identity entropy is lower than itsthreshold. The y-axis shows the percentage of users for whom entropy wasless than the threshold. The x-axis was chosen to represent thepopulation because the size of the community highly affects theinference probability. The depicted curves show this probability fordesired degrees of anonymity of 2, 3, and 5 (entropy thresholds werecalculated based on U=2, U=3, and U=5). In a user study, 80.8% of theusers who wanted to stay anonymous desired a degree of anonymity of two:U=2; and 5.1% of them desired a degree of anonymity of three: U=3. Asexpected, increasing the population decreases this probability. As thefigure shows, while in a small school the risk can be very high, in acampus of 10,000 students, it is still about 50% in online chats betweenstudents. This means even in a rather big school, users revealinformation that 50% of the time could lead to the invasion of theirdesired degree of anonymity. Therefore, identity inferences can be quiteprevalent in CMC.

FIG. 7B shows simulation results of a proximity-based application thatshows nearby users by their nickname or real name based on nearby users'privacy preferences. Anonymity invasions (identity inferences) happenwhen a user's real name or nickname is mapped to the person or a fewindividuals using their nearby presence. Population density anddistribution of nearby people has an important impact on the inferencerisk. Based on the results, the mean of the number of people thatsubjects saw in their vicinity was 9.1 and its distribution is shown inFIG. 7B. Among Poisson, Gaussian, exponential, Gamma, Lognormal, andNegative Binomial distributions, this distribution best fit the NegativeBinomial distribution. We also measured the number of application userscollected by the nearby application in the vicinity of each subject ateach situation. The average number of nearby application users was 3.9and probability distribution is shown in FIG. 7B. These two measures arehighly correlated (N=167, correlation coefficient, _(—)=0.92;statistical significance, p<0.001) and the number of nearby people canbe estimated as a linear function of the number of nearby applicationusers with rms_(err)=1.6. Subjects' answers show that their backgroundknowledge mostly consists of their visual information about theirvicinity and presence of nearby users. Therefore, significantinformation available to the inferrer includes the names shown by theapplication and physical appearance of current and past nearby users.

FIG. 7C shows the probability that a user is at the risk ofinstantaneous identity inference in a proximity-based application. They-axis shows the percentage of users whose identity entropy was lowerthan its threshold. Entropy threshold was calculated based on theirdesired degree of anonymity, U using the entropy equation. The x-axisrepresents the desired degree of anonymity. Each curve depicts the riskfor a different mean of nearby population density. The average densityin the middle curve is equal to the average density of our experimentaldata. It can be seen that, assuming mass usage, the risk of identityinference is about 7% for a desired degree of 3, and 20% for a desireddegree of anonymity of 5. As expected more crowded environments have alower chance of being at the identity inference risk.

FIG. 7D shows the same risk for two more general nearby distributions;Gaussian distribution and a completely random spatial distribution ofpeople (Poisson distribution). Again, it can be seen that the risk isless than 30% in the worst case, which is for a desired degree ofanonymity of 5 and an environment that is 30% less populated than ourcampus. These results are also confirmed by the results that we obtainedfrom the user experiment. Simulation of the risk of historicalinferences and experimental results show that for a given populationdensity, historical inferences happen less frequently than instantaneousinferences.

FIG. 8 is a diagram showing a classification tree system, indicatedgenerally at 160, that can be implemented to model user context data, aswell as to estimate inference risks. The system 160 includes apreprocessor software module 162 which processes a history of revealeduser information and information to be revealed about a user, and a treeclassification software tool 164 which processes an instantaneousinference function and a historical inference function generated by thepreprocessor 162, as well as place, proximity, time, and demographicvariables. The tress can be generated automatically using MatLab. Gini'sdiversity index can be used to choose an outgoing branch. Forreliability purposes, nodes may have 100 or more observations to besplit.

FIGS. 9-11 are diagrams showing classification results modeled by thesystem of FIG. 8. In a trial run, two instantaneous inference functionswere produced: inst_inf_(—)1 is the value of the instantaneous inferencefunction (INF₁(Q→Φ) where the number of possible values for a nearbyuser's identity, V, is set to the number of nearby users using theapplication. inst_inf_(—)2 is the value of the instantaneous inferencefunction where the number of possible values for a nearby user'sidentity, V, is set to the number of all nearby people. The value of thehistorical inference function, hist_inf, was calculated considering thehistory of co-proximity of the subject and the nearby user, up to twoweeks prior.

First, only inference functions were used as independent variables. Thetree structure and the rate of correct classifications changes based onthe ratio of the cost for false positives (A false positive is when noinferences happened in a situation but the tree classified the situationas high risk), C_(P), and false negatives (A false negative is when anidentity inference happened in a situation but the tree classified thesituation as normal.), C_(n.). We changed the cost of false negatives,C_(n) as compared to the cost of false positives, C_(P), and obtainedthe upper curve depicted in FIG. 10. As shown in the figure, forC_(n)=8·C_(P) the true positive rate is 85% and the true negative rateis 74%. Since correct guesses were made rarely with the questionnaires(about 12%), the false negative must be given a higher cost to produce alarge true positive rate. An instance of the tree for C_(n)=6·C_(P) isshown in FIG. 9. This tree uses the inference functions bothindividually and in combination. The tree basically means that thesituation is of high risk if either the instantaneous or the historicalinference is too high (hist_inf>threshold T1 or inst_inf>threshold T2),or they are both relatively high (hist_inf>T4 or inst_inf>T2 whereT4<T1).

In the second phase, only the time-, place-, and proximity-relatedinformation and demographic features were used as independent variables.It is noted that the proximity-related features include the number ofnearby application users and the number of nearby people; the latterimplies the number of possible values for a user's identity, V, incalculating the instantaneous inference functions. However, no featuredirectly measures the historical inference function. Correctclassification rate of the decision tree versus different values ofC_(n) is shown in the lower curve in FIG. 10. As shown in FIG. 10, for agiven true negative rate, the true positive rate is on average 30% lowerthan the true positive rate in the previous phase.

An instance of the tree for C_(n)=6·C_(P) is shown in FIG. 11. It has ahigher depth than the tree obtained in phase one. In the final phase,all five categories of variables in both previous phases were used asindependent variables. The difference in the success rate was less than0.5%.

Having thus described the invention in detail, it is to be understoodthat the foregoing description is not intended to limit the spirit orscope thereof. What is desired to be protected is set forth in thefollowing claims.

What is claimed is:
 1. A system for protecting individual privacy in acomputer network, comprising: first means for modeling a contextassociated with an individual user and storing the modeled context in adata store; second means for calculating an information entropy levelassociated with a user and storing the calculated information entropylevel in the data store; third means for calculating a privacy thresholdassociated with a user and storing the calculated privacy threshold inthe data store; and fourth means for executing at least one privacyprotection action based upon the modeled context, the calculatedinformation entropy level, and the calculated privacy threshold.
 2. Thesystem of claim 1, wherein the first means implements a deterministicmodel of background information associated with a user.
 3. The system ofclaim 1, wherein the first means implements a probabilistic model ofbackground information associated with a user.
 4. The system of claim 1,wherein the first means models vicinity information about a user'svicinity.
 5. The system of claim 4, wherein the modeled vicinityinformation includes at least one of names of nearby persons, profilesof nearby persons, and information about nearby locations.
 6. The systemof claim 1, wherein the first means models personal information aboutpeople nearby a user.
 7. The system of claim 6, wherein the personalinformation includes at least one of a user's demographic information,publicly-available information about people, gender information,ethnicity information, geotemporal routines, and individualinterests/attributes.
 8. The system of claim 1, wherein the second meansimplements an instantaneous entropy model.
 9. The system of claim 8,wherein the instantaneous entropy model models instantaneous informationentropy.
 10. The system of claim 8, wherein the instantaneous entropymodel models instantaneous identity entropy.
 11. The system of claim 1,wherein the second means implements a historical entropy model.
 12. Thesystem of claim 11, wherein the historical entropy model modelshistorical information entropy.
 13. The system of claim 11, wherein thehistorical entropy model models historical identity entropy.
 14. Thesystem of claim 1, wherein the third means determines and storesinformation about at least one of privacy preferences, anonymitypreferences, group privacy preferences, system administrator settings,legal requirements, or social customs.
 15. The system of claim 1,wherein the privacy protection action implemented by the fourth meansincludes at least one of blurring an answer to a user query forinformation, rejecting an answer to a user query for information,alerting a user as to a privacy risk, informing the user about a currententropy level, informing the user about a history of revealedinformation, reminding the user about current privacy settings,adjusting the user's privacy settings, and adjusting systemadministration policy settings.
 16. The system of claim 1, wherein thesecond means implements an inference function of${{{INF}\; 1\left( Q\rightarrow\Phi \right)} = \frac{H_{\max} - H_{c}}{H_{\max}}},$where H_(max) represents a maximum entropy value and H_(c) represents acurrent entropy value.
 17. The system of claim 16, wherein the secondmeans implements an inference function of$H_{c} = {- {\sum\limits_{1}^{V}\; {P\; {1 \cdot \log_{2}}P\; 1}}}$where V is a number of entities having an attribute falling within apre-defined sphere of influence, and P1 is a probability of a correctinference.